We mainly focus on this approach today for simplicity
Machine Learning-based
Trains machine learning models on labeled data to predict sentiment
e.g., “I love AI!” → positive, “AI is scary” → negative
More complex but powerful
Advanced AI
LLM models (like ChatGPT, Deepseek)
it analyze context for higher accuracy, but they are complex
Background: Sentiment Analysis and AI in Society
Connection to AI and Society:
Sentiment analysis reveals public attitudes toward AI and products, helping understand its societal impact.
Examples:
Companies analyze tweets to improve their products
Governments study comments to address concerns about the satisfaction of its public services
Researchers explore how AI in healthcare is perceived
e.g. trust in AI diagnostics
By analyzing text, we learn what excites or worries people, driving further development to benefit society
Hands-on Workshop
Step 1: Introduction and Setup
Objective: set up RStudio
Task 1.1: Open RStudio
Open RStudio
Create a new R script: File > New File > R Script
Save as sentiment_workshop.R if needed
Task 1.2: Install Packages
Run in the console
Code
# Install necessary packages for sentiment analysisinstall.packages(c("tidyverse", "tidytext", "textdata"))
note: tidyverse for data tasks; tidytext for text analysis; textdata for sentiment dictionaries
Step 2: Loading Tools and Data
Objective: Load R packages and a dataset of comments
Dataset
Fictional social media comments about AI’s societal impact
Code
library(tidyverse)library(tidytext)library(textdata)comments <-tibble(id =1:30,text =c("AI is amazing and will make education so much better!","I’m worried AI will take over jobs and leave people unemployed.","AI helps doctors save lives, it’s a game-changer.","I don’t trust AI, it feels creepy and invasive.","AI is okay, but it needs regulation to be safe.","AI in schools is cool, but it’s not perfect.","Wow, AI is so great, it’ll solve all our problems… yeah, right!", # Sarcasm"AI makes healthcare faster and more accurate, love it!","Why does AI know so much about me? It’s unsettling.","AI chatbots are fun to talk to, but sometimes useless.","AI in movies is awesome, makes everything so realistic!","I’m scared AI will control everything one day.","AI helps me study better, it’s like a personal tutor.","AI is overhyped, it’s not as smart as people think.", # Mixed"Using AI for art is creative and inspiring!","AI in cars? No way, I don’t trust self-driving tech.","AI makes my phone so smart, it’s incredible!","I feel like AI is watching me all the time, creepy.","AI in gaming makes battles so epic, I’m hooked!","AI might replace teachers, and that’s not cool.","AI saves time at work, but I miss human interaction.","AI’s fine, but it makes mistakes sometimes.", # Neutral"AI in music creation is a total game-changer!","I’m skeptical about AI making fair decisions.","AI is great, but only if it’s used ethically.", # Mixed"AI makes life easier, but it’s a bit scary too.", # Mixed"AI in agriculture boosts crops, amazing stuff!","I don’t get why everyone loves AI so much.", # Negative"AI tutors are helpful, but they don’t replace real teachers.","AI sounds cool, but I’m not sure it’s safe."# Mixed ))
Task 2.1: Run the Code
Run the code (highlight and press Ctrl+Enter)
Task 2.2: View the Data
print(comments)
Task 2.3: View comments
Run view(comments) in the console
How many comments are there?
Step 3: Exploring the Dataset
Objective: Understand the dataset’s structure
Code
colnames(comments)nrow(comments)comments$text[1]
Task 3.1: View Specific Comments
Run ncol(comments) to check how many columns are in the dataset
Task 3.2: View Specific Comments
Run comments$text[4] to see the fourth comment
Step 4: Splitting Text into Words
Objective: Learn tokenization to break text into words
Tokenization splits sentences into words
e.g. “AI is cool” \(\rightarrow\) {“AI,” “is,” “cool”}
Words are the building blocks for (most) sentiment analysis
Code
words <- comments %>%unnest_tokens(word, text)
Task 4.1: View Words
print(words)
Task 4.2: How many words are there?
Run nrow(words)
Task 4.3: View First 5 Words
Run head(words, 5)
Task 4.4: How many unique words are there?
Run n_distinct(words$word)
Task 4.5: View Most Common Words
Run words %>% count(word, sort = TRUE) %>% head(10)
Task 4.6: How many times does “better” appear?
Run words %>% filter(word == "better")
Step 5: Exploring Sentiment Lexicons
Objective: Understand how lexicons assign sentiment scores
A lexicon is a dictionary scoring words’ emotions
AFINN: -5 to +5
e.g. “Happy” = +3, “scary” = -2
Alternatives
Bing:
a binary classification: positive/negative
NRC:
emotion-based (e.g. joy, anger) and positive/negative classifications
Here, AI uses lexicons to quantify feelings in text
Step 5: Exploring Sentiment Lexicons (continued)
We will use the AFINN lexicon, which assigns scores to words based on their sentiment
Positive words have positive scores, negative words have negative scores
Neutral words have a score of 0
Code
afinn <-get_sentiments("afinn")
Task 5.1: View Lexicon
Run: head(afinn, 10)
Task 5.2: Check Scores for Specific Words
Run: afinn %>% filter(word == "trust")
What’s its score?
Run: afinn %>% filter(word == "bad")
Guess the score for “awesome”
List two words you think are negative
Step 6: Scoring Words for Sentiment
Objective: Assign sentiment scores to words
Match dataset words to AFINN lexicon scores
Only words in the lexicon get scores
Code
sentiment_scores <- words %>%inner_join(afinn, by ="word")
Task 6.1: View Scores
Run: print(sentiment_scores)
List one positive and one negative word
Task 6.2: Count Negative Words
Run: sentiment_scores %>% filter(value < 0)
Task 6.3: Count Positive Words
Run: sentiment_scores %>% filter(value > 0)
Step 7: Summarizing Comment Sentiment
Objective: Calculate total sentiment for each comment
Sum word scores per comment to get its overall sentiment
Positive total = Sum of positive comments (scores)
Negative total = Sun of negative comments (scores)
sentiment = positive total - negative total
Code
comment_sentiment <- sentiment_scores %>%group_by(id) %>%summarize(total_score =sum(value)) %>%right_join(comments, by ="id") %>%arrange(id)
Task 7.1: View Results
Run print(comment_sentiment)
Which comment has the lowest score?
Task 7.2: Sort by Total Score
Run comment_sentiment %>% arrange(desc(total_score))
Which is most positive?
Task 7.3: Check Comment 18’s Score
Read comment 18’s text and score
Do they match?
Task 7.4: Check Neutral Comments
Run comment_sentiment %>% filter(total_score == 0)
Run comment_sentiment_bing %>% filter(total_score > 0)
Run comment_sentiment_nrc %>% filter(total_score > 0)
Which comments are positive by Bing? Which by NRC?
Task 9.2: Visualize Bing Results
Create bar plots to visualize positive and negative word counts per comment
Use geom_bar() to show counts of positive and negative words
Code
library(ggplot2)ggplot(comment_sentiment_bing, aes(x = sentiment)) +geom_bar(fill ="blue", alpha =0.5) +labs(title ="Bing Lexicon: Histogram of Sentiment", x ="Sentiment", y ="Count") +geom_text(stat ="count", aes(label =after_stat(count)), vjust =-0.5) +theme_minimal()
Step 10: Comparing AFINN and Bing
Task 10.1 Comparing Bing with AFINN
Compare Bing and AFINN results
Create a comparison dataframe with both lexicons
Use left_join() to merge AFINN and Bing results by comment ID
Identify comments where Bing and AFINN disagree
Code
comparison_df2 <- comments %>%left_join(comment_sentiment_bing %>%select(id, sentiment), by ="id") %>%rename(sentiment_bing = sentiment) %>%left_join(comment_sentiment %>%select(id, sentiment), by ="id") %>%rename(sentiment_afinn = sentiment)comparison_df2# show the comments where Bing and AFINN disagreecomparison_df2 %>%filter(sentiment_bing != sentiment_afinn |is.na(sentiment_bing) !=is.na(sentiment_afinn))
Step 10: Comparing AFINN and Bing (continued)
Task 10.2: Visualize AFINN vs Bing
Create a bar plot comparing AFINN and Bing sentiments
Use geom_bar() to show counts of each sentiment per comment
Code
# Reshape the data to long format for plottingcomparison_long <- comparison_df2 %>%select(id, sentiment_afinn, sentiment_bing) %>%pivot_longer(cols =c(sentiment_afinn, sentiment_bing),names_to ="lexicon",values_to ="sentiment") %>%mutate(lexicon =recode(lexicon, sentiment_afinn ="AFINN", sentiment_bing ="Bing"))# Create a grouped bar plot to compare sentiment distributionsggplot(comparison_long, aes(x = sentiment, fill = lexicon)) +geom_bar(position ="dodge", alpha =0.5) +geom_text(stat ="count", aes(label =after_stat(count), group = lexicon),position =position_dodge(width =0.45), vjust =-0.5) +labs(title ="Comparison of Sentiment Labels: AFINN vs Bing",x ="Sentiment",y ="Count",fill ="Lexicon") +scale_fill_manual(values =c("AFINN"="blue", "Bing"="red")) +theme_minimal()
Step 11: Sentiment Analysis with Ollama
Objective: Use Ollama with Llama 3.2:3b to perform sentiment analysis
Ollama runs large language models (LLMs) like Llama 3.2:3b locally
offering nuanced sentiment analysis by understanding context
Code
install.packages("ollamar")
Load Ollama
Code
library(ollamar)test_connection()list_models()#pull("llama3.2:2b") # download a model (equivalent bash code: ollama run llama3.2:2b)
testing
Code
# generate a response/text based on a prompt; returns an httr2 response by defaultresp <-generate(model="llama3.2:3b", prompt="tell me a 50-word story")resp# get just the text from the response objectresp_process(resp, "text")# get the text as a tibble dataframeresp_process(resp, "df")
Step 11: Sentiment Analysis with Ollama (continued)
Define the function to get sentiment using Ollama
Code
get_sentiment_ollama <-function(text) { prompt <-paste("Classify the sentiment of the following text as Positive, Negative, or Neutral, and respond with only the label:", text) response <-generate(model ="llama3.2:3b", prompt = prompt, output="text")return(response)}
Task 11.1: Test the Function
Run get_sentiment_ollama("AI is amazing and will make education so much better!")
Step 3: Compare Lexicons among AFINN, Bing and NRC
Code
afinn <-get_sentiments("afinn")bing <-get_sentiments("bing")nrc <-get_sentiments("nrc")# Join and score with AFINNreal_sentiment_afinn <- real_words %>%inner_join(afinn, by ="word") %>%group_by(id) %>%summarize(total_score =sum(value, na.rm =TRUE))# Join and score with Bingreal_sentiment_bing <- real_words %>%inner_join(bing, by ="word") %>%group_by(id, sentiment.y) %>%summarize(word_count =n(), .groups ="drop") %>%pivot_wider(names_from = sentiment.y, values_from = word_count, values_fill =0)# Join and score with NRC (positive/negative)real_sentiment_nrc <- real_words %>%inner_join(nrc %>%filter(sentiment %in%c("positive", "negative")), by ="word") %>%group_by(id, sentiment.y) %>%summarize(word_count =n(), .groups ="drop") %>%pivot_wider(names_from = sentiment.y, values_from = word_count, values_fill =0)
Step 4: Visualize Results
Code
# AFINNlibrary(ggplot2)ggplot(real_sentiment_afinn, aes(x = id, y = total_score)) +geom_bar(stat ="identity", fill ="steelblue") +labs(title ="AFINN Sentiment Scores", x ="Comment ID", y ="Score") +theme_minimal()# Bingggplot(real_sentiment_bing, aes(x = id)) +geom_bar(aes(y = positive), stat ="identity", fill ="blue", alpha =0.5) +geom_bar(aes(y =-negative), stat ="identity", fill ="red", alpha =0.5) +labs(title ="Bing Lexicon: Positive vs Negative", x ="Comment ID", y ="Word Count") +theme_minimal()
Step 5: (Optional) Use Ollama/LLM for Sentiment
Code
# If you have Ollama and Llama3 installed:library(ollamar)get_sentiment_ollama <-function(text) { prompt <-paste("Classify the sentiment of the following text as positive, negative, or neutral, and respond with only the label in lower case:", text) response <-generate(model ="llama3.2:3b", prompt = prompt, output="text")return(response)}real_comments <- real_comments %>%mutate(sentiment_ollama =map_chr(text, get_sentiment_ollama))
Step 6: Compare and Discuss
Compare lexicon and LLM results
Which method best handles sarcasm, mixed emotions, or context?
Write a short paragraph (3–5 sentences) on your findings
<<<<<<< HEAD
<<<<<<< HEAD
=======
window.document.addEventListener("DOMContentLoaded", function (event) {
const toggleBodyColorMode = (bsSheetEl) => {
const mode = bsSheetEl.getAttribute("data-mode");
const bodyEl = window.document.querySelector("body");
if (mode === "dark") {
bodyEl.classList.add("quarto-dark");
bodyEl.classList.remove("quarto-light");
} else {
bodyEl.classList.add("quarto-light");
bodyEl.classList.remove("quarto-dark");
}
}
const toggleBodyColorPrimary = () => {
const bsSheetEl = window.document.querySelector("link#quarto-bootstrap");
if (bsSheetEl) {
toggleBodyColorMode(bsSheetEl);
}
}
toggleBodyColorPrimary();
const tabsets = window.document.querySelectorAll(".panel-tabset-tabby")
tabsets.forEach(function(tabset) {
const tabby = new Tabby('#' + tabset.id);
});
const isCodeAnnotation = (el) => {
for (const clz of el.classList) {
if (clz.startsWith('code-annotation-')) {
return true;
}
}
return false;
}
const onCopySuccess = function(e) {
// button target
const button = e.trigger;
// don't keep focus
button.blur();
// flash "checked"
button.classList.add('code-copy-button-checked');
var currentTitle = button.getAttribute("title");
button.setAttribute("title", "Copied!");
let tooltip;
if (window.bootstrap) {
button.setAttribute("data-bs-toggle", "tooltip");
button.setAttribute("data-bs-placement", "left");
button.setAttribute("data-bs-title", "Copied!");
tooltip = new bootstrap.Tooltip(button,
{ trigger: "manual",
customClass: "code-copy-button-tooltip",
offset: [0, -8]});
tooltip.show();
}
setTimeout(function() {
if (tooltip) {
tooltip.hide();
button.removeAttribute("data-bs-title");
button.removeAttribute("data-bs-toggle");
button.removeAttribute("data-bs-placement");
}
button.setAttribute("title", currentTitle);
button.classList.remove('code-copy-button-checked');
}, 1000);
// clear code selection
e.clearSelection();
}
const getTextToCopy = function(trigger) {
const codeEl = trigger.previousElementSibling.cloneNode(true);
for (const childEl of codeEl.children) {
if (isCodeAnnotation(childEl)) {
childEl.remove();
}
}
return codeEl.innerText;
}
const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
text: getTextToCopy
});
clipboard.on('success', onCopySuccess);
if (window.document.getElementById('quarto-embedded-source-code-modal')) {
// For code content inside modals, clipBoardJS needs to be initialized with a container option
// TODO: Check when it could be a function (https://github.com/zenorocha/clipboard.js/issues/860)
const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', {
text: getTextToCopy,
container: window.document.getElementById('quarto-embedded-source-code-modal')
});
clipboardModal.on('success', onCopySuccess);
}
var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
var mailtoRegex = new RegExp(/^mailto:/);
var filterRegex = new RegExp('/' + window.location.host + '/');
var isInternal = (href) => {
return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
}
// Inspect non-navigation links and adorn them if external
var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)');
for (var i=0; i {
const parentEl = el.parentElement;
if (parentEl) {
const cites = parentEl.dataset.cites;
if (cites) {
return {
el,
cites: cites.split(' ')
};
} else {
return findCites(el.parentElement)
}
} else {
return undefined;
}
};
var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
for (var i=0; i
>>>>>>> 7963cfc6b86e0ca201410bf35a4ce94e77b8c521